Synopsis: Poor Man’s Search Engine

Let’s learn how using search for extensive collections causes an antipattern.

We'll cover the following

Objective: Full-text search
Legitimate uses of the antipattern

I was working in a technical support job in 1995, at a time when companies were just starting to adopt the Web as a way to provide information to their customers. We had a collection of short documents describing solutions to common support questions, and we wanted to put them on the Web in a knowledge base application.

As the collection grew, I quickly realized that it needed to be searchable because customers didn’t want to browse through hundreds of articles to find what they were looking for. One strategy would be to organize the articles into categories, but even these were too large, and many articles belonged in multiple categories.

We wanted our customers to search the articles, narrowing down the list to those matching any criteria. The most flexible and straightforward interface allowed the customer to enter any set of words and show them the articles in which those words appear.

An article was weighted higher if it matched the search terms more fully. Also, we wanted to match word forms. For example, a search for the word “crash” should also match “crashed”, “crashes”, and “crashing”. Of course, the search had to work in a growing collection of documents quickly enough to be useful in a web application.

If this description sounds superfluous, it shouldn’t be surprising. Searching through text online is now so common that we can hardly recall the time before it was available. But using SQL to search by keywords while also making the solution both fast and accurate is deceptively difficult.

Objective: Full-text search#

Any application that stores text needs to search for words or phrases within that text. We use databases to store more textual data than ever, and, at the same time, we demand to be able to search text that matches our criteria at ever greater speeds. Web applications especially need high-performance and scalable database techniques for searching text.

One fundamental principle of SQL (and relational theory from which SQL is derived) is that a value in a column is atomic. That is, we can compare a value to another value, but we always compare the whole value when we do that. Comparing substrings is bound to be inefficient or inaccurate in SQL.

Despite this, we need a way to compare a short string with a longer string and to find a match when the short string occurs anywhere within the longer string. How can we achieve this using SQL?

Legitimate uses of the antipattern#

We will see that the expressions in the next lesson are legal SQL queries and that they have a straightforward and simple usage. That counts for a lot.

Performance is often important, but some queries are run so infrequently that it doesn’t make sense to invest a lot of resources to optimize them. Maintaining indexes to benefit a rarely used query could be just as costly as running that query inefficiently. If the nature of a query is ad hoc, there’s no guarantee that the index we define would benefit that query anyway.

It’s hard to use pattern-matching operators for complex queries, but if we design the patterns for simple cases, they can help us get the right results with the least fuss.

Solution: In No Particular Order

Antipattern: Pattern Matching Predicates

Mark as Completed

Report an Issue

Introduction

Logical Antipattern - Jaywalking

Logical Antipattern - Naive Trees

Logical Antipattern - ID Required

Logical Antipattern - Keyless Entry

Assessment: Logical Antipatterns - Part 1

Logical Antipattern - Entity-Attribute-Value

Logical Antipattern - Polymorphic Associations

Logical Antipattern - Multicolumn Attributes

Logical Antipattern - Metadata Tribbles

Assessment: Logical Antipatterns - Part 2

Physical Antipattern - Rounding Errors

Physical Antipattern - 31 Flavors

Physical Antipattern - Phantom Files

Physical Antipattern - Index Shotgun

Assessment: Physical Antipatterns

Query Antipattern - Fear of the Unknown

Query Antipattern - Ambiguous Groups

Query Antipattern - Random Selection

Query Antipattern - Poor Man’s Search Engine

Query Antipattern - Spaghetti Query

Query Antipattern - Implicit Columns

Assessment: Query Antipattern

Application Development Antipattern - Readable Passwords

Application Development Antipattern - SQL Injection

Application Development Antipattern - Pseudokey Neat-Freak

Application Development Antipattern - See No Evil

Application Development Antipattern - Diplomatic Immunity

Application Development Antipattern - Magic Beans

Assessment: Application Development Antipattern

Conclusion

Synopsis: Poor Man’s Search Engine

Objective: Full-text search#

Legitimate uses of the antipattern#